# HW5 – adding DMem

[Adding Iw & sw instructions to CPU instruction set: Rtype (no jr), addi, beq, bne, j]



## **HW4\_top** (Rtype MIPS)



## HW5\_top (HW4 MIPS + Iw & sw)



## HW5\_top (HW4 MIPS + Iw & sw)



# **HW4\_MIPS** control scheme



## HW5\_MIPS control scheme



New control signals appear in blue

#### The HW5\_top CPU has five phases.

**IF** – Instruction Fetch, which is carried out inside the Fetch Unit producing the instruction in the IR\_reg at the rising edge of the clock which ends the IF phase and starts the ID phase.

**ID** – Instruction Decode, which is the stage in which we do the following: Decode the instruction residing now at the IR\_reg and decide what should be done.

This means, we produce all control signals to be used by that instruction in all phases of this instruction – ID, ED and WB.

Read Rs into A\_reg and Rt into B\_reg

The rising edge of the clock sampling data into the A\_reg and B\_reg ends the ID phase and starts the EX phase.

**EX** – Execute, which is the phase in which the ALU calculates the result of A op B (in *Rtype* instructions) or A+sext\_imm (in *addi* instructions). The result is sampled into the ALUout\_reg at the rising edge of the clock which ends the EX phase and starts the WB phase. In this phase we also select Rs or Rd as the GPR file destination register to be written into in the Write Back phase.

**MEM** – Memory, which is the phase in which we read from the Dmem (in *lw*) or write into it (in *sw*). The read result is sampled into the MDR reg (*lw*) or into the Dmem (*sw*) at the rising edge of the clock which ends the MEM phase and starts the WB phase.

**WB** – Write Back, which is the final phase of the instruction. If this is an *Rtype* or *addi* instruction we write the ALUout\_reg\_pWB. In *Iw* we write the MDR\_reg value into the GPR file. If this is a *j*, *beq* or *bne* instruction, we do nothing at that stage. The rising edge of the clock sampling data into the GPR File, ends the WB phase and completes the execution of the instruction.

```
General signals in HW5 top
```

(new signals shown in **RED**)

CK - The 25 MHz clock coming out of the Clock\_driver.

RESET – coming out of the Host\_Intf and is used as reset signal to all registers

HOLD –coming out of the Host\_Intf and is used to freeze writing into all FFs & registers ID phase signals in HW5 top

IR\_reg- a 32 bit register that has the instruction we read from the IMem.

This signal is a rename of the IR\_reg\_pID signal coming out of the modified Fetch Unit Opcode – the 6 MSBs of IR\_reg. To be decoded and produce the control signals.

Rs - IR[25:21].

Rt - IR[20:16].

Rd - IR[15:11].

Funct – IR[5:0].

sext\_imm - renaming of sext\_imm\_pID coming out of the Fetch Unit.

GPR\_rd\_data1 – the 32 bit output of the rd\_data1 of the GPR and input to A\_reg.

GPR\_rd\_data2 – the 32 bit output of the rd\_data2 of the GPR and input to B\_reg.

Rs\_equals\_Rt - '1' if GPR\_rd\_data1== GPR\_rd\_data2, and '0' otherwise.

Used in branch instructions. That signal (renamed) is sent to the Fetch Unit.

#### <u>ID control signals in HW5 top</u> - These are created from decoding the opcode:

ALUsrcB – '1' when sext\_imm is used (in addi insruction).

ALUOP – a 2 bit signal. "00" maens add(addi inst.), "01" means subtract (not used), "10" will cause the ALU to follow the Funct field.

RegDst – '0' when we WB according to Rt (addi inst.) '1' -according to Rd (Rtype inst.).

RegWrite – '1' when we WB (Rtype or addi inst.), '0' when we don't (j, beq & bne inst.)

**MemWrite** – '1' in sw (writing to Dmem), **MemToReg** = '1' in lw (reading from DMem)

#### EX phase signals in HW5 top

A\_reg – a 32 bit register receiving the GPR\_rd\_data1 signal. Its value is used in EX phase B reg – a 32 bit register receiving the GPR rd data2 signal

sext\_imm\_reg - a 32 bit register receiving the sext\_imm coming from the Fetch Unit

Rt\_pEX – Rt delayed by 1 clock cycle

Rd\_pEX – Rd delayed by 1 clock cycle

ALUoutput – a 32 bit signal of the output of the ALU (renaming of ALU\_out signal coming out of the MIPS\_ALU component).

#### EX phase control signals in HW5 top

ALUsrcB\_pEX – ALUsrcB delayed by 1 clock cycle.

Funct\_pEX – Funct delayed by 1 clock cycle.

ALUOP\_pEX – ALUOP delayed by 1 clock cycle.

RegDst\_pEX – RegDst delayed by 1 clock cycle.

RegWrite\_pEX – RegWrite delayed by 1 clock cycle.

**MemWrite\_pEX** – MemWrite delayed by 1 clock cycle.

**MemToReg\_pEX** – MemToReg delayed by 1 clock cycle.

#### MEM phase signals in HW5 top

**B\_reg\_pMEM** – a 32 bit register receiving the B\_reg signal (i.e., B\_reg delayed by 1 CK). This register has the data to be written into the DMem in sw instruction.

**Rd\_pMEM** – the output of RegDest mux selecting to which register the CPU writes in the WB phase.

#### MEM phase control signals in HW5 top

**MemWrite\_pMEM** - MemWrite\_pEX delayed by 1 clock cycle.

**MemToReg\_pMEM** – MemToReg\_pEX delayed by 1 clock cycle.

**RegWrite\_pMEM** – RegWrite\_pEX delayed by 1 clock cycle.

#### WB phase signals in HW5 top

MDR\_reg- a 32 bit register that has the data read from the memory. Rename of DMem\_rd\_data signal coming out of the HW5\_Host\_Intf\_4sim component.

**ALUout\_reg\_pWB** - a 32 bit register that has the ALUour\_reg data delayed by 1 CK cycle **GPR\_wr\_data** - a 32 bit signal that is the output of the MemToReg mux (selecting between MDR\_reg and ALUout\_reg\_pWB).

**Rd\_pWB** – Rd\_pMEM delayed by 1 clock cycle.

#### WB phase control signals in HW5 top

**MemToReg\_pWB** – MemToReg\_pMEM delayed by 1 clock cycle **RegWrite\_pWB** – RegWrite\_pMEM delayed by 1 clock cycle.

You get a **HW5\_top\_4sim.empty** file in which you have all of these signals defined . You have to add your design of the HW5\_MIPS, i.e., write the equations of the top file. In this vhd file we use the **Fetch\_Unit**, **GPR**, **MIPS\_ALU**, **Clock\_Driver** and the **BYOC\_Host\_Intf\_4sim**.

#### **Description of the HW5\_top\_4sim project**

- 1. HW5\_top\_4sim.vhd This is your design of HW5. It uses the GPR, MIPS\_ALU the updated Fetch\_Unit, the BYOC\_Clock\_driver\_4simand the BYOC\_Host\_Intf\_4sim components and all of the signals described in 2b.
- **2. GPR.vhd** your GPR File design you prepared in HW3.
- **3.** dual\_port\_memory.vhd part of the GPR File design you prepared in HW3.
- **4.** MIPS\_ALU.vhd your MIPS\_ALU design you prepared in HW3.
- **5. Fetch\_Unit.vhd** The Fetch Unit you prepared in HW2 after the modifications detailed in HW4.
- **6. BYOC\_Clock\_driver\_4sim.vhd** the CK divider & driver we use for simulation (also good for the Modelsim simulator)
- **7. BYOC\_Host\_Intf\_4sim.vhd** The prepared components including the IMem and "preloaded" program and creating the reset & ck signals.
- **8. SIM\_HW5\_TB.vhd** The TB vhd file prepared in advance. See the note in 9 below.
- **9.** SIM\_HW5\_TB\_data.dat this is a data file prepared in advance that is read by the SIM\_HW5\_TB and used to compare the simulation results to the expected ones.
- 10. SIM\_HW5\_program.dat The program file for simulation.
- 11. SIM\_HW5\_filenames.vhd The actual path information of the two dat files.

**NOTE:** In **SIM\_HW5\_filenames.vhd** we specified the path of the **dat** file. You should update that according to your simulation project actual path.

#### **Simulation report**

You should submit a single zip file for the Simulation and implementation phases. It should have two directories/folders. The first is called **Simulation**, the 2<sup>nd</sup> is called **Implementation**.

In the **Simulation** folder you will have 3 sub-folder of:

- Src\_4sim here you put all of the \*.vhd sources and the \*.dat file (to be used by the TB)
- Sim here you should have the HW5\_4sim project created by the simulator you used
- Docs Here you put your simulation report. The first few lines in the report will have your ID numbers (names are optional). See the instructions See instructions in BYOC\_HW5.doc. <u>This should be a WORD file and not a PDF file so remarks can be added when grading the report</u>.

Later, in the Implementation phase you will add 3 sub-folders to the **Implementation** folder.

These will be:

- Src\_4ISE here you put all of the \*.vhd sources and the \*.ucf file (and no TB file)
- ISE here you should have the HW5\_top project created by the Xilinx ISE SW.
- DOCs Same as in simulation. See instructions in BYOC\_HW5.doc

#### **Simulation report (cont)**

The first part of the program we have in the IMem in HW5 simulation (addresses 400000h to 4001ACh) is very similar to the one you had in HW4 and is meant to check that you did not mess anything during the changes you did. That part is tested by the TB. The SIM\_HW5\_TB.dat file contains test data only for this part of the IMem program.

The rest of the IMem program, from 4001B0h till the end (400330h), is meant to test the writing and reading from the DMem. The program is given in Appendix A at the end of the HW5 document. It can also be seen in the last part of the SIM\_HW5\_program.dat file.

In order to test your design you need to look at the simulated waveforms and decide whether it is OK or not. You should run the simulation for 10.5us (10500ns). In the doc file you need to attach screen captures describing this part of the simulation you made, as detailed in 3.1.

### **Simulation report (cont)**

3.1) The listed below signals should be presented in the screen capture you need to attach to your report. Show clock cycles **196-224** (following the end of the reset pulse, find i=**196-224**) and make the values of all signals readable. For this you will probably need to show clocks **196-210** and **210-224** separately. These are the signals that can help you in "testing" the DMem.

```
Note the some of these signals are inside the Host Intf:
CK
RESET
HOLD
i (the serial no. of the clock cycle – created by the TB)
MIPS DMem adrs
MIPS_DMem_wr_data
MIPS DMem we
MIPS_DMem_rd_data
DMem_reg0
DMem_reg1
DMem_reg2
DMem_reg3
DMem_reg4
PC reg
                          ( i=196-224 means CK cycles from 8440 ns to 9600 ns )
IR reg
```

### **Simulation report (cont)**

3.2) Explain in detail what happens, i.e., what do we see here. Note that it is essential to the success of your future design that you will verify that the design does what we wanted it to do in these CK cycles.

In that doc file you need also to answer the following questions:

- 3.3) What is the latency of Rtype instruction? How many nop-s should be inserted between two consecutive Rtype instructions if the 2<sup>nd</sup> one uses the result of the 1<sup>st</sup> one?
- 3.4) Explain the limitation of beq that tests a register that is calculated by Rtype instruction. As an example, translate the following C if statement: for (i=0;i<10;i++) { ... } where i is register \$3.
- 3.5) Are there any other limitations due to the pipeline structure in the instructions we implemented (Rtype, addi, beq, bne, j, lw, sw)? How can we overcome these limitations (e.g., by adding nop-s)?

# Now let's talk on the Implementation phase

# Your pipelined MIPS



# Your pipelined MIPS



Then memories (in yellow), are part of the BYOC\_Host\_Intf

# Your MIPS with the BYOC\_Host\_Intf



We load the IMem via the BYOCIntf SW

 Then use it also to apply a single ck and check the rdbk signals



- We can "see" the DMem on a VGA screen
- What does this mean?



# The VGA screen

A word, 32 bits, is 32 BW pixels. Bit 0 is on the left.



- 13x315=4095

   i.e., we have 4095 words for the "screen memory"
- Addresses 20000000h-20003FFCh +52 (34h) means going down All rights belong to Daniel Seidner

#### The simulation version



## The implementation version



```
entity HW5 top is
                                                                    All the i/o pins in the
port (
-- Host intf signals - Infrastructure signals [
RS232 Rx
            : in STD LOGIC;
                                                                    HW5 top entity were not
RS232 Tx
             : out STD LOGIC;
-- VGA signals
                                                                    really used in simulation
VGA h sync
                          STD LOGIC:
             : out
                          STD_LOGIC;
VGA v sync
             : out
VGA red0
                          STD LOGIC;
             : out
                                                                    (except the CK 50MHz).
VGA red1
                          STD LOGIC:
             : out
VGA red2
                          STD LOGIC;
             : out
                                                                    In implementation we use
VGA grn0
                          STD LOGIC:
             : out
VGA grn1
                          STD LOGIC:
             : out
VGA grn2
                          STD LOGIC;
             : out
                                                                    all of the i/o shown in blue
VGA blu1
                          STD LOGIC;
             : out
VGA blu2
                          STD LOGIC;
             : out
--Flash Mem signals
                    STD LOGIC;-- '0' when accessing Nexys2 SDRAM -not used
MT ce n
             : out
Flash adrs
                           STD LOGIC VECTOR(23 downto 1);--Flash read/write address
             : out
Flash ce n
                          STD LOGIC;-- '0' when accessing Flash mem
             : out
                          STD LOGIC;-- '0' when writing to Flash mem
Flash we n
             : out
                          STD LOGIC;-- '0' when reding from Flash mem
Flash oe n
             : out
                          STD LOGIC;-- '0' when reseting Flash mem
Flash rp n
             : out
                          STD LOGIC;-- '1' when Flash mem FSM is done
Flash sts
             : in
Flash data
             : inout STD LOGIC VECTOR(15 downto 0);--Data from/to Flash to/from IMem/DMem
--KBD signals
PS2C
                                       STD LOGIC;-- PS2 keyboard clock
                          : in
PS2D
                          : in
                                       STD LOGIC;-- PS2 keyboard data
--general signals
leds out : out STD LOGIC_VECTOR(7 downto 0);-- 7=Flash_stts,6=MIPS_ck,5-0=Host_Intf vesion
CK 50MHz
             : in
                          STD LOGIC;
                          STD LOGIC vector(3 downto 0);--btn0=single clock ,btn3=manual reset
buttons in
             : in
                          STD LOGIC VECTOR(7 downto 0);-- to be described later
switches in
             : in
                          STD LOGIC VECTOR (6 downto 0);-- to the 7 seg LEDs
sevenseg out : out
                          STD LOGIC VECTOR (3 downto 0) -- to the 7 seg LEDs
anodes out
             : out
---- signals to be tested by the TB -- REMOVED FOR IMPLEMENTATION!!
end HW5 top;
```

#### **HW5\_top - implementation**

- 1. Take your **HW5\_top\_4sim.vhd**, remove all TB signals and rename to **HW5\_top.vhd**. You can look at the difference between the the **HW5\_top\_4sim.empty** and the **HW5\_top.empty** files that were given to you.
- 2. You should replace the **BYOC\_Host\_Intf\_4sim.vhd** with a component that looks the same, the **BYOC\_Host\_Intf.ngc**, which has the infra-structure that allows the PC to load data into the IMem via the RS232 by the **BYOCInterface** SW.
- 3. The files we will use to implement the design on the Nexys2 board are:
- **BYOC.ucf** The file listing which signal are connected to which FPGA pins in the Nexys2 board.
- **HW5\_top.vhd** This is your design of HW5
- Fetch\_Unit.vhd The Fetch Unit you prepared in HW2 after modifications of HW4
- GPR.vhd your GPR File design you prepared in HW3.
- **dual\_port\_memory.vhd** part of the GPR File design you prepared in HW3.
- MIPS\_ALU.vhd your MIPS\_ALU design you prepared in HW3.
- BYOC\_Clock\_driver.vhd the CK divider & driver we also used in HW2 & HW4.
- BYOC\_Host\_Intf.ngc The actual infrastructure interfacing the PC.
- 4. Now run the Xilinx ISE SW, create a HW5\_top.bit file and test it.

#### <u>HW5\_top – testing the implemented design</u>

We'll run that the **BYOCInterface** SW and load the IMem. Then run the circuit in a single ck mode and check that the reading we see at the points we "hooked" to the rdbk signals are as what we expect.

The file we want to load into the IMem is called "HW5\_rect4.txt". The file itself includes all the information required in order to load it into the IMem and switch to a single ck mode. Following the loading, we can run in single ck mode and see the readback values on the PC screen after each clock. The HW5\_rect4 program is given in Appendix B at the end of the BYOC\_HW5.doc document.

This time we would like to connect a VGA screen to the Nexys2 board.

- What is the value of register \$2 after 122 cks?
- What happens after 126 CKs?
   (To answer thee two questions look what happens from 120 CKs till 130 CKs)
- What happens when you press the RUN button?

#### <u>HW5\_top – implementation report</u>

You should submit a single zip file for the Simulation and implementation phases. It should have two directories/folders. The first is called **Simulation**, the 2<sup>nd</sup> is called **Implementation**. In the **Implementation** directory you should have 3 sub-directories:

- Src\_4ISE here you put all of the \*.vhd sources and the \*.ucf file (and no TB file)
- ISE here you should have the HW5 project created by the Xilinx ISE SW.
- **Docs** Here you put your implementation report. The first few lines in the report will have your ID numbers (names are optional). See the instructions See instructions in BYOC\_HW5.doc.

<u>This should be a WORD file and not a PDF file</u> – so remarks can be added when grading the report.

In this part of the implementation report you should answer the following questions:

- 1. What is the value of register \$2 after 122 cks?
- 2. What happens after 126 CKs? (To answer thee two questions look with the BYOCSIntf SW what happens from 120 CKs till 130 CKs)
- 3. What happens when you press the RUN button?
- 4. Explain the HW5\_rect4 program (what is the job of every register used. What is done in each loop, etc.)
- 5. How long does it take [in seconds] to draw a 32x32 white square when we use the draw loop of the **HW5\_rect4** program?
- 6. Can you shorten the loop? If you can, write the code and explain.
- 7. Can you think of a faster way to draw the square in the same short loop? If you can, write the code and explain.

As part of completing this part of the course you will have to show me how you run the design on the Nexys2 board in the lab. And maybe answer some questions.

# **Enjoy the assignment!**

Thanks for listening!